feat: Integrate Chutes API with Kimi K2.5-TEE model #7

echobt · 2026-02-03T14:21:32Z

Summary

This PR integrates the Chutes API with the Kimi K2.5-TEE model for the agent.

Changes

Add ChutesClient class for Chutes API (https://api.chutes.ai/v1)
Support CHUTES_API_KEY environment variable for authentication
Set moonshotai/Kimi-K2.5-TEE as default model
Enable thinking mode by default with <think>...</think> parsing
Use Kimi K2.5 recommended parameters (temp=1.0, top_p=0.95)
Increase context limit to 256K tokens
Add openai>=1.0.0 dependency for OpenAI-compatible API client

Testing

python3 -c "from src.llm.client import ChutesClient; print('OK')"

Summary by CodeRabbit

New Features
- Added multi-provider LLM support with Chutes API as the default provider and OpenRouter as fallback
- Enabled AI thinking mode for improved reasoning and response quality
- Extended token limits and context windows for enhanced processing capacity
Improvements
- Increased maximum iterations for agent execution
- Added caching and timeout optimizations for better performance

- Add ChutesClient class for Chutes API (https://api.chutes.ai/v1) - Support CHUTES_API_KEY environment variable for authentication - Set moonshotai/Kimi-K2.5-TEE as default model - Enable thinking mode by default with <think>...</think> parsing - Use Kimi K2.5 recommended parameters (temp=1.0, top_p=0.95) - Increase context limit to 256K tokens - Add openai>=1.0.0 dependency for OpenAI-compatible API client

coderabbitai · 2026-02-03T14:21:47Z

📝 Walkthrough

Walkthrough

The pull request introduces multi-provider LLM support by adding a Chutes API client with thinking mode capabilities as the default provider, alongside OpenRouter as a fallback. The implementation includes a factory function for provider selection, updated configuration defaults for the Kimi K2.5-TEE model, and extended cost/token tracking across both providers.

Changes

Cohort / File(s)	Summary
Agent Initialization `agent.py`	Updated agent startup to use provider-driven configuration and `get_llm_client()` factory function; added logging for provider and thinking mode selection; preserved core agent loop logic.
LLM Provider Implementation `src/llm/client.py`	Introduced `ChutesClient` for Chutes API with thinking mode extraction, error mapping, and cost tracking; extended `LiteLLMClient` with thinking support and temperature handling; added `get_llm_client()` factory function for provider selection; enhanced `LLMResponse` with thinking and cost fields.
Configuration Defaults `src/config/defaults.py`	Migrated from OpenRouter-centric to Chutes/Kimi K2.5-TEE defaults; enabled thinking mode; increased model context limit to 256000 and max iterations to 350; added shell timeout, caching parameters, and cost limit configuration.
Dependencies `pyproject.toml`, `requirements.txt`	Added openai>=1.0.0 dependency for Chutes API compatibility.

Sequence Diagram(s)

sequenceDiagram
    participant Agent as Agent (main)
    participant Config as CONFIG
    participant Factory as get_llm_client()
    participant ChutesC as ChutesClient
    participant LiteLLMC as LiteLLMClient
    participant API as Chutes/OpenRouter API

    Agent->>Config: read provider setting
    Config-->>Agent: provider = "chutes" (or fallback)
    Agent->>Factory: get_llm_client(provider, model, cost_limit, enable_thinking)
    alt provider == "chutes"
        Factory->>ChutesC: instantiate with auth, thinking_mode
        ChutesC-->>Factory: client ready
    else provider == "openrouter"
        Factory->>LiteLLMC: instantiate with litellm config
        LiteLLMC-->>Factory: client ready
    end
    Factory-->>Agent: llm_client
    Agent->>ChutesC: chat(messages, temperature, max_tokens)
    ChutesC->>API: request (with thinking mode params)
    API-->>ChutesC: response (thinking + content)
    ChutesC->>ChutesC: extract thinking_content
    ChutesC-->>Agent: LLMResponse(thinking, cost, usage)
    Agent->>Agent: run agent loop with response

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

feat: Remove OpenRouter support, replace litellm with Chutes API #3 — Modifies the LLM client/provider layer and implements Chutes API integration with provider-driven initialization.
feat: Integrate Chutes API with Kimi K2.5-TEE model #1 — Directly adds ChutesClient, provider factory function, and provider-driven configuration to the same LLM integration surface.

Poem

🐰 A Chutes client hops into view,
Thinking modes now work, tried and true!
The factory builds what we need,
Multi-providers help us succeed! 🌟

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 76.92% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'feat: Integrate Chutes API with Kimi K2.5-TEE model' directly and clearly summarizes the main change: integration of Chutes API with a specific model, which aligns with the primary focus across all modified files (agent.py, config defaults, llm/client.py, and dependency additions).

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch feature/chutes-api-kimi-integration

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

echobt · 2026-02-03T14:23:51Z

Part of Umbrella PR: #6 (Epic: Complete Chutes API Integration)

This PR is the first step in the stacked PR sequence:

This PR (feat: Integrate Chutes API with Kimi K2.5-TEE model #7) - Chutes API integration (merge first)
docs: Comprehensive documentation with Mermaid diagrams #8 - Documentation (depends on this)
feat: Remove OpenRouter support, replace litellm with Chutes API #9 - OpenRouter removal (depends on feat: Integrate Chutes API with Kimi K2.5-TEE model #7 and docs: Comprehensive documentation with Mermaid diagrams #8)

Please see #6 for the complete merge strategy.

coderabbitai

Actionable comments posted: 1

🤖 Fix all issues with AI agents

In `@src/llm/client.py`:
- Around line 102-164: The ChutesClient currently only reads CHUTES_API_TOKEN so
users setting CHUTES_API_KEY (as documented) will get an auth error; update the
token retrieval in ChutesClient.__init__ to accept either environment variable
(check CHUTES_API_TOKEN first, then CHUTES_API_KEY or vice versa) and set
self._api_token accordingly, and update the raised LLMError message to reference
both env var names; ensure the later OpenAI client initialization still uses
self._api_token.

🧹 Nitpick comments (1)

pyproject.toml (1)

30-30: Consider consolidating dependency declarations.

openai>=1.0.0 is declared in both requirements.txt and pyproject.toml with matching versions. If both files are intentional (e.g., for tool compatibility or development workflows), maintain alignment as part of standard practice.

coderabbitai · 2026-02-03T14:31:18Z

src/llm/client.py

+class ChutesClient:
+    """LLM Client for Chutes API with Kimi K2.5-TEE.
+
+    Chutes API is OpenAI-compatible, hosted at https://llm.chutes.ai/v1
+    Default model: moonshotai/Kimi-K2.5-TEE with thinking mode enabled.
+
+    Environment variable: CHUTES_API_TOKEN
+
+    Kimi K2.5 parameters:
+    - Thinking mode: temperature=1.0, top_p=0.95
+    - Instant mode: temperature=0.6, top_p=0.95
+    - Context window: 256K tokens
+    """
+
+    def __init__(
+        self,
+        model: str = CHUTES_DEFAULT_MODEL,
+        temperature: Optional[float] = None,
+        max_tokens: int = 16384,
+        cost_limit: Optional[float] = None,
+        enable_thinking: bool = True,
+        # Legacy params (kept for compatibility)
+        cache_extended_retention: bool = True,
+        cache_key: Optional[str] = None,
+    ):
+        self.model = model
+        self.max_tokens = max_tokens
+        self.cost_limit = cost_limit or float(os.environ.get("LLM_COST_LIMIT", "100.0"))
+        self.enable_thinking = enable_thinking
+
+        # Set temperature based on thinking mode if not explicitly provided
+        if temperature is None:
+            params = KIMI_K25_THINKING_PARAMS if enable_thinking else KIMI_K25_INSTANT_PARAMS
+            self.temperature = params["temperature"]
+        else:
+            self.temperature = temperature
+
+        self._total_cost = 0.0
+        self._total_tokens = 0
+        self._request_count = 0
+        self._input_tokens = 0
+        self._output_tokens = 0
+        self._cached_tokens = 0
+
+        # Get API token
+        self._api_token = os.environ.get("CHUTES_API_TOKEN")
+        if not self._api_token:
+            raise LLMError(
+                "CHUTES_API_TOKEN environment variable not set. "
+                "Get your API token at https://chutes.ai",
+                code="authentication_error"
+            )
+
+        # Import and configure OpenAI client for Chutes API
+        try:
+            from openai import OpenAI
+            self._client = OpenAI(
+                api_key=self._api_token,
+                base_url=CHUTES_API_BASE,
+            )
+        except ImportError:
+            raise ImportError("openai not installed. Run: pip install openai")
+


⚠️ Potential issue | 🟠 Major

Support the documented CHUTES_API_KEY env var to prevent auth failures.

The client only checks CHUTES_API_TOKEN. If users follow the documented CHUTES_API_KEY, auth will fail. Accept both.

🔧 Suggested fix

- self._api_token = os.environ.get("CHUTES_API_TOKEN") + self._api_token = ( + os.environ.get("CHUTES_API_KEY") + or os.environ.get("CHUTES_API_TOKEN") + ) if not self._api_token: raise LLMError( - "CHUTES_API_TOKEN environment variable not set. " + "CHUTES_API_KEY (or CHUTES_API_TOKEN) environment variable not set. " "Get your API token at https://chutes.ai", code="authentication_error" )

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

class ChutesClient:

"""LLM Client for Chutes API with Kimi K2.5-TEE.

Chutes API is OpenAI-compatible, hosted at https://llm.chutes.ai/v1

Default model: moonshotai/Kimi-K2.5-TEE with thinking mode enabled.

Environment variable: CHUTES_API_TOKEN

Kimi K2.5 parameters:

- Thinking mode: temperature=1.0, top_p=0.95

- Instant mode: temperature=0.6, top_p=0.95

- Context window: 256K tokens

"""

def __init__(

self,

model: str = CHUTES_DEFAULT_MODEL,

temperature: Optional[float] = None,

max_tokens: int = 16384,

cost_limit: Optional[float] = None,

enable_thinking: bool = True,

# Legacy params (kept for compatibility)

cache_extended_retention: bool = True,

cache_key: Optional[str] = None,

):

self.model = model

self.max_tokens = max_tokens

self.cost_limit = cost_limit or float(os.environ.get("LLM_COST_LIMIT", "100.0"))

self.enable_thinking = enable_thinking

# Set temperature based on thinking mode if not explicitly provided

if temperature is None:

params = KIMI_K25_THINKING_PARAMS if enable_thinking else KIMI_K25_INSTANT_PARAMS

self.temperature = params["temperature"]

else:

self.temperature = temperature

self._total_cost = 0.0

self._total_tokens = 0

self._request_count = 0

self._input_tokens = 0

self._output_tokens = 0

self._cached_tokens = 0

# Get API token

self._api_token = os.environ.get("CHUTES_API_TOKEN")

if not self._api_token:

raise LLMError(

"CHUTES_API_TOKEN environment variable not set. "

"Get your API token at https://chutes.ai",

code="authentication_error"

)

# Import and configure OpenAI client for Chutes API

try:

from openai import OpenAI

self._client = OpenAI(

api_key=self._api_token,

base_url=CHUTES_API_BASE,

)

except ImportError:

raise ImportError("openai not installed. Run: pip install openai")

class ChutesClient:

"""LLM Client for Chutes API with Kimi K2.5-TEE.

Chutes API is OpenAI-compatible, hosted at https://llm.chutes.ai/v1

Default model: moonshotai/Kimi-K2.5-TEE with thinking mode enabled.

Environment variable: CHUTES_API_TOKEN

Kimi K2.5 parameters:

- Thinking mode: temperature=1.0, top_p=0.95

- Instant mode: temperature=0.6, top_p=0.95

- Context window: 256K tokens

"""

def __init__(

self,

model: str = CHUTES_DEFAULT_MODEL,

temperature: Optional[float] = None,

max_tokens: int = 16384,

cost_limit: Optional[float] = None,

enable_thinking: bool = True,

# Legacy params (kept for compatibility)

cache_extended_retention: bool = True,

cache_key: Optional[str] = None,

):

self.model = model

self.max_tokens = max_tokens

self.cost_limit = cost_limit or float(os.environ.get("LLM_COST_LIMIT", "100.0"))

self.enable_thinking = enable_thinking

# Set temperature based on thinking mode if not explicitly provided

if temperature is None:

params = KIMI_K25_THINKING_PARAMS if enable_thinking else KIMI_K25_INSTANT_PARAMS

self.temperature = params["temperature"]

else:

self.temperature = temperature

self._total_cost = 0.0

self._total_tokens = 0

self._request_count = 0

self._input_tokens = 0

self._output_tokens = 0

self._cached_tokens = 0

# Get API token

self._api_token = (

os.environ.get("CHUTES_API_KEY")

or os.environ.get("CHUTES_API_TOKEN")

)

if not self._api_token:

raise LLMError(

"CHUTES_API_KEY (or CHUTES_API_TOKEN) environment variable not set. "

"Get your API token at https://chutes.ai",

code="authentication_error"

)

# Import and configure OpenAI client for Chutes API

try:

from openai import OpenAI

self._client = OpenAI(

api_key=self._api_token,

base_url=CHUTES_API_BASE,

)

except ImportError:

raise ImportError("openai not installed. Run: pip install openai")

🧰 Tools

🪛 Ruff (0.14.14)

[warning] 124-124: Unused method argument: cache_extended_retention

(ARG002)

[warning] 125-125: Unused method argument: cache_key

(ARG002)

[warning] 149-153: Avoid specifying long messages outside the exception class

(TRY003)

[warning] 163-163: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)

[warning] 163-163: Avoid specifying long messages outside the exception class

(TRY003)

🤖 Prompt for AI Agents

In `@src/llm/client.py` around lines 102 - 164, The ChutesClient currently only reads CHUTES_API_TOKEN so users setting CHUTES_API_KEY (as documented) will get an auth error; update the token retrieval in ChutesClient.__init__ to accept either environment variable (check CHUTES_API_TOKEN first, then CHUTES_API_KEY or vice versa) and set self._api_token accordingly, and update the raised LLMError message to reference both env var names; ensure the later OpenAI client initialization still uses self._api_token.

echobt mentioned this pull request Feb 3, 2026

docs: Comprehensive documentation with Mermaid diagrams #8

Open

This was referenced Feb 3, 2026

feat: Remove OpenRouter support, replace litellm with Chutes API #9

Open

Epic: Complete Chutes API Integration (Umbrella PR) #6

Open

coderabbitai bot reviewed Feb 3, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Integrate Chutes API with Kimi K2.5-TEE model #7

feat: Integrate Chutes API with Kimi K2.5-TEE model #7

Uh oh!

echobt commented Feb 3, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Feb 3, 2026 •

edited

Loading

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Poem

Uh oh!

echobt commented Feb 3, 2026

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Feb 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

feat: Integrate Chutes API with Kimi K2.5-TEE model #7

Are you sure you want to change the base?

feat: Integrate Chutes API with Kimi K2.5-TEE model #7

Uh oh!

Conversation

echobt commented Feb 3, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Testing

Related

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Feb 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Poem

Uh oh!

echobt commented Feb 3, 2026

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 3, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

echobt commented Feb 3, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Feb 3, 2026 •

edited

Loading